p
q p
,
signals to be discovered can only reside on a baseline [Greif, et
]. Therefore, estimating the baseline of a spectrum has been
n many areas [Greif, et al., 2008; de Sanctis, et al., 2011;
to, et al., 2019]. To successfully discover the chemicals from a
, the accuracy of baseline estimation is thus very desirable [Hastie
hirani, 1990; Price, et al., 2008; Vyumvuhore, et al., 2014;
-Agudelo, et al., 2017; Acikgoz, et al., 2018].
whole-genome pattern discovery problem
A sequencing technology has been improved continuously for
ecades. Because of this, the high-throughput sequencing data
yed an outstanding role in modern biology research nowadays.
dern DNA sequencing technology or the next-generation
ng technology (NGS) can generate sequencing count data for a
in less than an hour’s time. It thus has changed, shaped and
med the modern biology/medicine research thoroughly [Hood and
2013]. Among them, the most exciting project is the human
project, which has made a huge impact on the genome research
o, 1984; Collins and Galas, 1993; Collins and McKusick, 2001;
et al., 2017; Dunn, et al., 2018].
are many subjects for the whole-genome pattern discovery based
count data. Among them, two are more relevant to the machine
concepts. The first one is how to analyse whole genome
s for knowledge discovery, i.e., the research of the sequence
y alignment approaches. It has had at least five decades since the
ence homology alignment algorithm was developed [Needleman
nsch, 1970]. Besides, the currently widely used one named as
has been developed for three decades [Altschul, et al., 1990].
earlier homology alignment algorithms align sequences pair-
nd mostly globally. Along with the huge increase of sequencing
g NGS, the speed of sequence comparison is hugely challenged
y when comparing a novel sequence against a database of